Goto

Collaborating Authors

 Lafayette








ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP

Neural Information Processing Systems

In this work, we propose an innovative test-time poisoned sample detection framework that hinges on the in-terpretability of model predictions, grounded in the semantic meaning of inputs. We contend that triggers (e.g., infrequent words) are not supposed to fundamentally alter the underlying semantic meanings of poisoned samples as they want to